Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis
نویسندگان
چکیده
Hidden Markov Model (HMM) based speech synthesis using Decision Tree (DT) for duration prediction is known to produce over-averaged rhythm. To alleviate this problem, this paper proposes a two level duration prediction method together with outlier removal. This method takes advantages of accurate regression capability by Extreme Learning Machine (ELM) for phone level duration prediction, and the capability of distributing state durations by DT for state level duration prediction. Experimental results showed that the method decreased RMSE of phone duration, increased the fluctuation of syllable duration, and achieved 63.75% in preference evaluation. Furthermore, this method does not incur laborious manual alignment on training corpus.
منابع مشابه
Analysis of Duration Prediction Accuracy in HMM-Based Speech Synthesis
Appropriate phoneme durations are essential for high quality speech synthesis. In hidden Markov model-based text-tospeech (HMM-TTS), durations are typically modeled statistically using state duration probability distributions and duration prediction for unseen contexts. Use of rich context features enables synthesis without high-level linguistic knowledge. In this paper we analyze the accuracy ...
متن کاملAutomatic methods for lexical stress assignment and syllabification
Improvements in automatic lexical stress assignment and syllabification can increase the quality of text-to-speech synthesis as well as decrease the memory requirements for dictionaries. Several methods were evaluated. Machine-learning based methods are preferred since they easily adapt to multiple languages. For stress prediction, encouraging results were obtain by combining a decision tree ap...
متن کاملSimultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state du...
متن کاملDuration modeling for HMM-based speech synthesis
This paper proposes a new approach to state duration modeling for HMM-based speech synthesis. A set of state durations of each phoneme HMM is modeled by a multi-dimensional Gaussian distribution, and duration models are clustered using a decision tree based context clustering technique. In the synthesis stage, state durations are determined by using the state duration models. In this paper, we ...
متن کاملTone Question of Tree Based Context Clustering for Hidden Markov Model Based Thai Speech Synthesis
Problem statement: In HMM-based Thai speech synthesis, tone is an important issue that brings about the intelligibility of the synthesized speech. Tone distortion resulted from imbalance of the training data should be appropriately treated. Approach: This study described an HMM-based speech synthesis system for Thai language. In the system, spectrum, pitch and state duration are modeled simulta...
متن کامل